For this portfolio, I’d like to find a way to include a non-cluttered, key statistical takeaway figure. To begin with, I will start with the simple p value. Specifically, I would like to depict the p value somewhere on the graph itself. Conceptually, it seems to me that including the p value on the graph itself might make for a better or more persuasive visual presentation. So, let’s see what it might look like to showing the relationship among variables in the graph and provide a statistical figure on that same graph. Note: I use a two sample t-test AKA independent samples t-test below. Specifically, the Welch’s t-test. I believe that there are issues with the use of measures of statistical significance when exhaustive data are analyzed, as with these two variables which are not “sample” but rather all available data on the subjects. But for demonstration purposes, I’ll use the p value here.
r = getOption("repos")
r["CRAN"] = "http://cran.us.r-project.org"
options(repos = r)
install.packages('tidyr')
##
## The downloaded binary packages are in
## /var/folders/ls/z3xhd0bx0jzchy3zd1hkbnhr0000gn/T//Rtmpe4htEW/downloaded_packages
install.packages('readr')
##
## The downloaded binary packages are in
## /var/folders/ls/z3xhd0bx0jzchy3zd1hkbnhr0000gn/T//Rtmpe4htEW/downloaded_packages
library(ggplot2)
library(tidyr)
install.packages('gapminder')
##
## The downloaded binary packages are in
## /var/folders/ls/z3xhd0bx0jzchy3zd1hkbnhr0000gn/T//Rtmpe4htEW/downloaded_packages
install.packages('gganimate')
##
## The downloaded binary packages are in
## /var/folders/ls/z3xhd0bx0jzchy3zd1hkbnhr0000gn/T//Rtmpe4htEW/downloaded_packages
install.packages ('gifski')
##
## The downloaded binary packages are in
## /var/folders/ls/z3xhd0bx0jzchy3zd1hkbnhr0000gn/T//Rtmpe4htEW/downloaded_packages
install.packages ('ggrepel')
##
## The downloaded binary packages are in
## /var/folders/ls/z3xhd0bx0jzchy3zd1hkbnhr0000gn/T//Rtmpe4htEW/downloaded_packages
library(gapminder)
library(gganimate)
library(gifski)
library(ggrepel)
theme_set(theme_bw())
library(readr)
dfMilStatsTest <- read.csv("~/Documents/GitHub/Portfolio6-VizStats/MilStatsExcel.csv")
ggplot(dfMilStatsTest, aes(Time, Value, group = Object, color = Object)) +
geom_line(size=1) +
geom_point(size = 2) +
geom_label_repel(aes(x = Time, y=Value, label = Object, fill = Object), hjust = 0, direction = "y", nudge_x = 20,
fontface = 'bold',
color = 'white', size=6,
segment.color = 'grey50',
segment.size = 0.5) +
transition_reveal(Time)
Everything looks good when we knit that. With those preliminaries out of the way, let’s try to work in a p value.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble 3.1.6 ✓ stringr 1.4.0
## ✓ purrr 0.3.4 ✓ forcats 0.5.1
## ✓ dplyr 1.0.8
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggpubr)
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
dfMilStatsTest %>%
group_by(Object) %>%
get_summary_stats(Value, type = "mean_sd")
## # A tibble: 2 × 5
## Object variable n mean sd
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 combat_deaths Value 17 306. 313.
## 2 suicides Value 17 297. 132.
res <- t.test(Value ~ Object, data = dfMilStatsTest, paired = TRUE)
res
##
## Paired t-test
##
## data: Value by Object
## t = 0.10087, df = 16, p-value = 0.9209
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -189.5716 208.5128
## sample estimates:
## mean of the differences
## 9.470588
With a p-value = 0.9209, let’s see how we can plug that in to our graph.
-excellent tutorial https://www.lobdata.com.br/2020/09/15/how-to-perform-correlation-analysis-in-time-series-data-using-r/
install.packages('feasts')
##
## The downloaded binary packages are in
## /var/folders/ls/z3xhd0bx0jzchy3zd1hkbnhr0000gn/T//Rtmpe4htEW/downloaded_packages
install.packages('TSstudio')
##
## The downloaded binary packages are in
## /var/folders/ls/z3xhd0bx0jzchy3zd1hkbnhr0000gn/T//Rtmpe4htEW/downloaded_packages
library(feasts)
## Loading required package: fabletools
library(tsibble)
##
## Attaching package: 'tsibble'
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:tsibble':
##
## interval
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(TSstudio)
library(readr)
forTSdf <- read_csv("~/Documents/GitHub/Portfolio6-VizStats/forTSdf.csv")
## Rows: 17 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Object
## dbl (4): Time, Value, CD, Suic
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(forTSdf)
install.packages("dplyr")
##
## The downloaded binary packages are in
## /var/folders/ls/z3xhd0bx0jzchy3zd1hkbnhr0000gn/T//Rtmpe4htEW/downloaded_packages
library(dplyr)
TSdf <- forTSdf %>%
select(Time, CD, Suic)
TSdf2 <- ts(data = TSdf[, c("CD", "Suic")],
start = c(2000),
end = c(2016),
frequency = 1)
ts_info(TSdf2)
## The TSdf2 series is a mts object with 2 variables and 17 observations
## Frequency: 1
## Start time: 2000 1
## End time: 2016 1
ts_plot(TSdf2,
title = "Military Suicides and Combat Deaths",
Ytitle = "Suicides and Combat Deaths",
Xtitle = "Year")
TSdf2[, c("CD")] %>%
acf(lag.max = 300,
main = "Autocorrelation Plot - CD")
TSdf2[, c("CD")] %>%
acf(lag.max = 300,
main = "Autocorrelation Plot - Suic")
rm(TSdf)